Creating Imputation Classes Using Classification Tree Methodology
نویسندگان
چکیده
Virtually all surveys encounter some level of item nonresponse. To address this potential source of bias, practitioners often use imputation to replace missing values with valid values through some form of stochastic modeling. In order to improve the reliabilities of such models, imputation classes are formed to produce homogenous groups of respondents, where homogeneity is measured with respect to the item that will be imputed. A common method used to form imputation classes is Chi-squared Automatic Interaction Detection (CHAID) where the splitting rule is based on Chi-squared tests. This paper examines an alternative methodology used to form imputation classes, nonparametric classification trees where the splitting rules are based on the Gini index of impurity, which is one possible splitting rule used in Classification and Regression Trees (CART). In addition to a brief description of the two classification tree methodologies, we provide some comparative examples using simple generated data and real data. Finally, we use the imputation classes with three imputation procedures: mode value imputation, proportional random imputation, and weighted sequential hot-deck. To provide an additional comparison, we model the item nonresponse using logistic regression or polychotomous regression.
منابع مشابه
Comparison of Machine Learning Algorithms for Broad Leaf Species Classification Using UAV-RGB Images
Abstract: Knowing the tree species combination of forests provides valuable information for studying the forest’s economic value, fire risk assessment, biodiversity monitoring, and wildlife habitat improvement. Fieldwork is often time-consuming and labor-required, free satellite data are available in coarse resolution and the use of manned aircraft is relatively costly. Recently, unmanned aeria...
متن کاملApplication of soil properties, auxiliary parameters, and their combination for prediction of soil classes using decision tree model
Soil classification systems are very useful for a simple and fast summarization of soil properties. These systems indicate the method for data summarization and facilitate connections among researchers, engineers, and other users. One of the practical systems for soil classification is Soil Taxonomy (ST). As determining soil classes for an entire area is expensive, time-consuming, and almost ...
متن کاملExploitation of Neural Methods for Imputation
In this presentation I will discuss modern imputation methods based on the neural nets methodology. The most important method used here is the Tree-Structured Self-Organising Map, or TS-SOM. The TS-SOM is a computationally fast variation of the basic Self-Organising Maps, or SOMs. It is a combination of the SOM, tree-structured clustering and computational speedup techniques. SOM is an iterativ...
متن کاملApplication of Different Methods of Decision Tree Algorithm for Mapping Rangeland Using Satellite Imagery (Case Study: Doviraj Catchment in Ilam Province)
Using satellite imagery for the study of Earth's resources is attended by manyresearchers. In fact, the various phenomena have different spectral response inelectromagnetic radiation. One major application of satellite data is the classification ofland cover. In recent years, a number of classification algorithms have been developed forclassification of remote sensing data. One of the most nota...
متن کاملLand Cover Classification Using IRS-1D Data and a Decision Tree Classifier
Land cover is one of basic data layers in geographic information system for physical planning and environmentalmonitoring. Digital image classification is generally performed to produce land cover maps from remote sensing data,particularly for large areas. In the present study the multispectral image from IRS LISS-III image along with ancillary datasuch as vegetation indices, principal componen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006